TorchCode

Project Url: duoan/TorchCode
Introduction: ๐Ÿ”ฅ LeetCode for PyTorch โ€” practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
More: Author   ReportBugs   OfficialWebsite   
Tags:

๐Ÿ”ฅ TorchCode

Crack the PyTorch interview.

Practice implementing operators and architectures from scratch โ€” the exact skills top ML teams test for.

Like LeetCode, but for tensors. Self-hosted. Jupyter-based. Instant feedback.

PyTorch Jupyter Docker Python License: MIT

GitHub stars GitHub Container Registry Hugging Face Spaces Problems GPU

Star History Chart


๐ŸŽฏ Why TorchCode?

Top companies (Meta, Google DeepMind, OpenAI, etc.) expect ML engineers to implement core operations from memory on a whiteboard. Reading papers isn't enough โ€” you need to write softmax, LayerNorm, MultiHeadAttention, and full Transformer blocks code.

TorchCode gives you a structured practice environment with:

Feature
๐Ÿงฉ 40 curated problems The most frequently asked PyTorch interview topics
โš–๏ธ Automated judge Correctness checks, gradient verification, and timing
๐ŸŽจ Instant feedback Colored pass/fail per test case, just like competitive programming
๐Ÿ’ก Hints when stuck Nudges without full spoilers
๐Ÿ“– Reference solutions Study optimal implementations after your attempt
๐Ÿ“Š Progress tracking What you've solved, best times, and attempt counts
๐Ÿ”„ One-click reset Toolbar button to reset any notebook back to its blank template โ€” practice the same problem as many times as you want
Open In Colab Open in Colab Every notebook has an "Open in Colab" badge + toolbar button โ€” run problems in Google Colab with zero setup

No cloud. No signup. No GPU needed. Just make run โ€” or try it instantly on Hugging Face.


๐Ÿš€ Quick Start

Option 0 โ€” Try it online (zero install)

Launch on Hugging Face Spaces โ€” opens a full JupyterLab environment in your browser. Nothing to install.

Or open any problem directly in Google Colab โ€” every notebook has an Open In Colab badge.

Option 0b โ€” Use the judge in Colab (pip)

In Google Colab, install the judge from PyPI so you can run check(...) without cloning the repo:

!pip install torch-judge

Then in a notebook cell:

from torch_judge import check, status, hint, reset_progress
status()           # list all problems and your progress
check("relu")      # run tests for the "relu" task
hint("relu")       # show a hint

Option 1 โ€” Pull the pre-built image (fastest)

docker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latest

If the registry image is unavailable for your platform, use Option 2 instead. This is the common path on Apple Silicon / arm64.

Option 2 โ€” Build locally

make run

make run will try the prebuilt image first and automatically fall back to a local build when needed.

Open http://localhost:8888 โ€” that's it. Works with both Docker and Podman (auto-detected).


๐Ÿ“‹ Problem Set

Frequency: ๐Ÿ”ฅ = very likely in interviews, โญ = commonly asked, ๐Ÿ’ก = emerging / differentiator

๐Ÿงฑ Fundamentals โ€” "Implement X from scratch"

The bread and butter of ML coding interviews. You'll be asked to write these without torch.nn.

# Problem What You'll Implement Difficulty Freq Key Concepts
1 ReLU Open In Colab relu(x) Easy ๐Ÿ”ฅ Activation functions, element-wise ops
2 Softmax Open In Colab my_softmax(x, dim) Easy ๐Ÿ”ฅ Numerical stability, exp/log tricks
16 Cross-Entropy Loss Open In Colab cross_entropy_loss(logits, targets) Easy ๐Ÿ”ฅ Log-softmax, logsumexp trick
17 Dropout Open In Colab MyDropout (nn.Module) Easy ๐Ÿ”ฅ Train/eval mode, inverted scaling
18 Embedding Open In Colab MyEmbedding (nn.Module) Easy ๐Ÿ”ฅ Lookup table, weight[indices]
19 GELU Open In Colab my_gelu(x) Easy โญ Gaussian error linear unit, torch.erf
20 Kaiming Init Open In Colab kaiming_init(weight) Easy โญ std = sqrt(2/fan_in), variance scaling
21 Gradient Clipping Open In Colab clip_grad_norm(params, max_norm) Easy โญ Norm-based clipping, direction preservation
31 Gradient Accumulation Open In Colab accumulated_step(model, opt, ...) Easy ๐Ÿ’ก Micro-batching, loss scaling
40 Linear Regression Open In Colab LinearRegression (3 methods) Medium ๐Ÿ”ฅ Normal equation, GD from scratch, nn.Linear
3 Linear Layer Open In Colab SimpleLinear (nn.Module) Medium ๐Ÿ”ฅ y = xW^T + b, Kaiming init, nn.Parameter
4 LayerNorm Open In Colab my_layer_norm(x, ฮณ, ฮฒ) Medium ๐Ÿ”ฅ Normalization, running stats, affine transform
7 BatchNorm Open In Colab my_batch_norm(x, ฮณ, ฮฒ) Medium โญ Batch vs layer statistics, train/eval behavior
8 RMSNorm Open In Colab rms_norm(x, weight) Medium โญ LLaMA-style norm, simpler than LayerNorm
15 SwiGLU MLP Open In Colab SwiGLUMLP (nn.Module) Medium โญ Gated FFN, SiLU(gate) * up, LLaMA/Mistral-style
22 Conv2d Open In Colab my_conv2d(x, weight, ...) Medium ๐Ÿ”ฅ Convolution, unfold, stride/padding

๐Ÿง  Attention Mechanisms โ€” The heart of modern ML interviews

If you're interviewing for any role touching LLMs or Transformers, expect at least one of these.

# Problem What You'll Implement Difficulty Freq Key Concepts
23 Cross-Attention Open In Colab MultiHeadCrossAttention (nn.Module) Medium โญ Encoder-decoder, Q from decoder, K/V from encoder
5 Scaled Dot-Product Attention Open In Colab scaled_dot_product_attention(Q, K, V) Hard ๐Ÿ”ฅ softmax(QK^T/โˆšd_k)V, the foundation of everything
6 Multi-Head Attention Open In Colab MultiHeadAttention (nn.Module) Hard ๐Ÿ”ฅ Parallel heads, split/concat, projection matrices
9 Causal Self-Attention Open In Colab causal_attention(Q, K, V) Hard ๐Ÿ”ฅ Autoregressive masking with -inf, GPT-style
10 Grouped Query Attention Open In Colab GroupQueryAttention (nn.Module) Hard โญ GQA (LLaMA 2), KV sharing across heads
11 Sliding Window Attention Open In Colab sliding_window_attention(Q, K, V, w) Hard โญ Mistral-style local attention, O(nยทw) complexity
12 Linear Attention Open In Colab linear_attention(Q, K, V) Hard ๐Ÿ’ก Kernel trick, ฯ†(Q)(ฯ†(K)^TV), O(nยทdยฒ)
14 KV Cache Attention Open In Colab KVCacheAttention (nn.Module) Hard ๐Ÿ”ฅ Incremental decoding, cache K/V, prefill vs decode
24 RoPE Open In Colab apply_rope(q, k) Hard ๐Ÿ”ฅ Rotary position embedding, relative position via rotation
25 Flash Attention Open In Colab flash_attention(Q, K, V, block_size) Hard ๐Ÿ’ก Tiled attention, online softmax, memory-efficient

๐Ÿ—๏ธ Architecture & Adaptation โ€” Put it all together

# Problem What You'll Implement Difficulty Freq Key Concepts
26 LoRA Open In Colab LoRALinear (nn.Module) Medium โญ Low-rank adaptation, frozen base + BA update
27 ViT Patch Embedding Open In Colab PatchEmbedding (nn.Module) Medium ๐Ÿ’ก Image โ†’ patches โ†’ linear projection
13 GPT-2 Block Open In Colab GPT2Block (nn.Module) Hard โญ Pre-norm, causal MHA + MLP (4x, GELU), residual connections
28 Mixture of Experts Open In Colab MixtureOfExperts (nn.Module) Hard โญ Mixtral-style, top-k routing, expert MLPs

โš™๏ธ Training & Optimization

# Problem What You'll Implement Difficulty Freq Key Concepts
29 Adam Optimizer Open In Colab MyAdam Medium โญ Momentum + RMSProp, bias correction
30 Cosine LR Scheduler Open In Colab cosine_lr_schedule(step, ...) Medium โญ Linear warmup + cosine annealing

๐ŸŽฏ Inference & Decoding

# Problem What You'll Implement Difficulty Freq Key Concepts
32 Top-k / Top-p Sampling Open In Colab sample_top_k_top_p(logits, ...) Medium ๐Ÿ”ฅ Nucleus sampling, temperature scaling
33 Beam Search Open In Colab beam_search(log_prob_fn, ...) Medium ๐Ÿ”ฅ Hypothesis expansion, pruning, eos handling
34 Speculative Decoding Open In Colab speculative_decode(target, draft, ...) Hard ๐Ÿ’ก Accept/reject, draft model acceleration

๐Ÿ”ฌ Advanced โ€” Differentiators

# Problem What You'll Implement Difficulty Freq Key Concepts
35 BPE Tokenizer Open In Colab SimpleBPE Hard ๐Ÿ’ก Byte-pair encoding, merge rules, subword splits
36 INT8 Quantization Open In Colab Int8Linear (nn.Module) Hard ๐Ÿ’ก Per-channel quantize, scale/zero-point, buffer vs param
37 DPO Loss Open In Colab dpo_loss(chosen, rejected, ...) Hard ๐Ÿ’ก Direct preference optimization, alignment training
38 GRPO Loss Open In Colab grpo_loss(logps, rewards, group_ids, eps) Hard ๐Ÿ’ก Group relative policy optimization, RLAIF, within-group normalized advantages
39 PPO Loss Open In Colab ppo_loss(new_logps, old_logps, advantages, clip_ratio) Hard ๐Ÿ’ก PPO clipped surrogate loss, policy gradient, trust region

โš™๏ธ How It Works

Each problem has two notebooks:

File Purpose
01_relu.ipynb โœ๏ธ Blank template โ€” write your code here
01_relu_solution.ipynb ๐Ÿ“– Reference solution โ€” check when stuck

Workflow

1. Open a blank notebook           โ†’  Read the problem description
2. Implement your solution         โ†’  Use only basic PyTorch ops
3. Debug freely                    โ†’  print(x.shape), check gradients, etc.
4. Run the judge cell              โ†’  check("relu")
5. See instant colored feedback    โ†’  โœ… pass / โŒ fail per test case
6. Stuck? Get a nudge              โ†’  hint("relu")
7. Review the reference solution   โ†’  01_relu_solution.ipynb
8. Click ๐Ÿ”„ Reset in the toolbar  โ†’  Blank slate โ€” practice again!

In-Notebook API

from torch_judge import check, hint, status

check("relu")               # Judge your implementation
hint("causal_attention")    # Get a hint without full spoiler
status()                    # Progress dashboard โ€” solved / attempted / todo

๐Ÿ“… Suggested Study Plan

Total: ~12โ€“16 hours spread across 3โ€“4 weeks. Perfect for interview prep on a deadline.

Week Focus Problems Time
1 ๐Ÿงฑ Foundations ReLU โ†’ Softmax โ†’ CE Loss โ†’ Dropout โ†’ Embedding โ†’ GELU โ†’ Linear โ†’ LayerNorm โ†’ BatchNorm โ†’ RMSNorm โ†’ SwiGLU MLP โ†’ Conv2d 2โ€“3 hrs
2 ๐Ÿง  Attention Deep Dive SDPA โ†’ MHA โ†’ Cross-Attn โ†’ Causal โ†’ GQA โ†’ KV Cache โ†’ Sliding Window โ†’ RoPE โ†’ Linear Attn โ†’ Flash Attn 3โ€“4 hrs
3 ๐Ÿ—๏ธ Architecture + Training GPT-2 Block โ†’ LoRA โ†’ MoE โ†’ ViT Patch โ†’ Adam โ†’ Cosine LR โ†’ Grad Clip โ†’ Grad Accumulation โ†’ Kaiming Init 3โ€“4 hrs
4 ๐ŸŽฏ Inference + Advanced Top-k/p Sampling โ†’ Beam Search โ†’ Speculative Decoding โ†’ BPE โ†’ INT8 Quant โ†’ DPO Loss โ†’ GRPO Loss โ†’ PPO Loss + speed run 3โ€“4 hrs

๐Ÿ›๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Docker / Podman Container      โ”‚
โ”‚                                          โ”‚
โ”‚  JupyterLab (:8888)                      โ”‚
โ”‚    โ”œโ”€โ”€ templates/  (reset on each run)   โ”‚
โ”‚    โ”œโ”€โ”€ solutions/  (reference impl)      โ”‚
โ”‚    โ”œโ”€โ”€ torch_judge/ (auto-grading)       โ”‚
โ”‚    โ”œโ”€โ”€ torchcode-labext (JLab plugin)    โ”‚
โ”‚    โ”‚     ๐Ÿ”„ Reset โ€” restore template     โ”‚
โ”‚    โ”‚     ๐Ÿ”— Colab โ€” open in Colab        โ”‚
โ”‚    โ””โ”€โ”€ PyTorch (CPU), NumPy              โ”‚
โ”‚                                          โ”‚
โ”‚  Judge checks:                           โ”‚
โ”‚    โœ“ Output correctness (allclose)       โ”‚
โ”‚    โœ“ Gradient flow (autograd)            โ”‚
โ”‚    โœ“ Shape consistency                   โ”‚
โ”‚    โœ“ Edge cases & numerical stability    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Single container. Single port. No database. No frontend framework. No GPU.

๐Ÿ› ๏ธ Commands

make run    # Build & start (http://localhost:8888)
make stop   # Stop the container
make clean  # Stop + remove volumes + reset all progress

๐Ÿงฉ Adding Your Own Problems

TorchCode uses auto-discovery โ€” just drop a new file in torch_judge/tasks/:

TASK = {
    "id": "my_task",
    "title": "My Custom Problem",
    "difficulty": "medium",
    "function_name": "my_function",
    "hint": "Think about broadcasting...",
    "tests": [ ... ],
}

No registration needed. The judge picks it up automatically.


๐Ÿ“ฆ Publishing torch-judge to PyPI (maintainers)

The judge is published as a separate package so Colab/users can pip install torch-judge without cloning the repo.

Automatic (GitHub Action)

Pushing to master after changing the package version triggers .github/workflows/pypi-publish.yml, which builds and uploads to PyPI. No git tag is required.

  1. Bump version in torch_judge/_version.py (e.g. __version__ = "0.1.1").
  2. Configure PyPI Trusted Publisher (one-time):
    • PyPI โ†’ Your project torch-judge โ†’ Publishing โ†’ Add a new pending publisher
    • Owner: duoan, Repository: TorchCode, Workflow: pypi-publish.yml, Environment: (leave empty)
    • Run the workflow once (push a version bump to master or Actions โ†’ Publish torch-judge to PyPI โ†’ Run workflow); PyPI will then link the publisher.
  3. Release: commit the version bump and git push origin master.

Alternatively, use an API token: add repository secret PYPI_API_TOKEN (value = pypi-... from PyPI) and set TWINE_USERNAME=__token__ and TWINE_PASSWORD from that secret in the workflow if you prefer not to use Trusted Publishing.

Manual

pip install build twine
python -m build
twine upload dist/*

Version is in torch_judge/_version.py; bump it before each release.


โ“ FAQ

Do I need a GPU?
No. Everything runs on CPU. The problems test correctness and understanding, not throughput.
Can I keep my solutions between runs?
Blank templates reset on every make run so you practice from scratch. Save your work under a different filename if you want to keep it. You can also click the ๐Ÿ”„ Reset button in the notebook toolbar at any time to restore the blank template without restarting.
Can I use Google Colab instead?
Yes! Every notebook has an Open in Colab badge at the top. Click it to open the problem directly in Google Colab โ€” no Docker or local setup needed. You can also use the Colab toolbar button inside JupyterLab.
How are solutions graded?
The judge runs your function against multiple test cases using torch.allclose for numerical correctness, verifies gradients flow properly via autograd, and checks edge cases specific to each operation.
Who is this for?
Anyone preparing for ML/AI engineering interviews at top tech companies, or anyone who wants to deeply understand how PyTorch operations work under the hood.

๐Ÿค Contributors

Thanks to everyone who has contributed to TorchCode.

duoan
duoan
Ando233
Ando233
ThierryHJ
ThierryHJ

Auto-generated from the GitHub contributors graph with avatars and GitHub usernames.


Built for engineers who want to deeply understand what they build.

If this helped your interview prep, consider giving it a โญ


โ˜• Buy Me a Coffee

Buy Me A Coffee

BMC QR Code

Scan to support

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools